Exploring Visualization capabilities of R

R is equipped with multiple and extensive visualization capabilities.
The following sections will explore the capabilities of the following:

Introduction to Plot() Function

Plot function in R programming language is a basic function. It can be used to visualize data in 2D format by creating graphs and charts and visualize correlation among data variables. Common graphs used in this function include scatter plots and line graphs.

The generic syntax for the Plot function is:

Plot(x,y…)

And a more complex function is:

plot(x, y, type, main, sub, xlab, ylab)

Where you can assign the type of plot you want to see by:

p”: points
l”: lines
b”: both point and lines in a single place
c”: join empty point by the lines
o”: both lines and over-plotted point
h”: histogram “s”: stair steps
n”: no plotting

xlab”: x-axis legends
ylab”: y-axis legends

Example Exercise

We have 10 students in two different courses and their grades for their recent exam.
The X variable denotes the first course and the Y variable denotes the second course.

X = 40, 15, 50, 12, 22, 29, 21, 35, 14, 15
Y = 41, 42, 32, 14, 42, 27, 13, 50, 33, 22

Put them into the correct syntax and create a line plot using type “l”

First, define X in a vector and then use the assigned variable and declare a lines plot using the plot function.

X = c(40, 15, 50, 12, 22, 29, 21, 35, 14, 15)
plot(X ,type = "l")

Next, define Y in a vector and then use the assigned variable and declare a points plot using the plot function.

Y = c(41, 42, 32, 14, 42, 27, 13, 50, 33, 22)
plot(Y ,type = "p")

There are a plethora of ways to use the basic plot() function to your advatage.
Below are a representation of other visualization capabilities within the default plot function.


Source: Kumar, 2020

Case Study 1 - using plot()

Exploring Visualization of R package developed by Dr. Guangchuang Yu from Southern Medical University

This package allows us to access the latest data and historical data of cases of all countries, plot data on a map, and create various graphs.

We can configure his data by first installing his package via Github.

Package: nCov2019
By: Dr. Guangchuang Yu (Southern Medical University)

remotes::install_github("GuangchuangYu/nCov2019")  

library(nCov2019)
get_nCov2019()
load_nCov2019()

Then we check that the packages necessary for visualization are installed properly.

require(nCov2019)
require(dplyr)

Now we can get a first impression of the dataset.

The get() function searches and calls a data object and the load() function makes sure all of the R objects saved in the file are loaded into R.

By assigning x to the function below, it triggers download of statistical data of COVID-19.
By assigning y to the function below, it triggers to load historical data of COVID-19.

x <- get_nCov2019()
y <- load_nCov2019()

We can then check the results for x and y accordingly. X informs us the total number of cases in China and Y informs us when the data was last updated.

x
  
China (total confirmed cases): 95901
last update: 2020-12-21 20:45:32
y
  
nCov2019 historical data 
last update: 2020-11-26 

We can also check worldwide statistics easily for details on confirmed cases, deaths, etc.
This function automatically sorts the entire data set by number of confirmed cases.

x['global']
    name           confirm   suspect dead    deadRate  showRate  heal
1   China          95901      7      4771    4.97      FALSE     89480
2   United States  18277433   0      324898  1.78      FALSE     10622096
3   India          10055560   0      145810  1.45      FALSE     9606111
4   Brazil         7238600    0      186764  2.58      FALSE     6409986
5   Russia         2850042    0      50723   1.78      FALSE     2273510
6   France         2529756    0      60665   2.4       FALSE     189638
7   United Kingdom 2079564    0      67718   3.26      FALSE     4380
8   Turkey         2043704    0      18351   0.9       FALSE     1834705
9   Italy          1964054    0      69214   3.52      FALSE     1281258
10  Spain          1817448    0      48926   2.69      FALSE     196958
11  Argentina      1541285    0      41813   2.71      FALSE     1368346
12  Germany         1531998   0      26655   1.74      FALSE     1129280

A Static heat map using plot()

The plot() function is very versatile and includes the capability to visualize data on a map.
Since we already assigned x to a function, we can plot them in the plot() function below to get a static heat map.

plot(x)

Introduction to ggplots()

Unlike the Plot function that exists by default on the R platform, you can download the ggplots2 package which will allow you to declaratively create visualization of data by providing ggplot2 with the information on how you want to map variables to aesthetics.

ggplot is a package that makes it simple to create complex plots from data in a data frame.
It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatterplot.
This helps in creating publication quality plots with minimal amounts of adjustments and tweaking.

Below are some examples of the capabilities of ggplot. In contrast to plot(), it is evident that the aesthetic and programmability capabilities are much more advanced.



Source: Holtz, 2020

Case Study 2 - using ggplot()

Based on the same data as Case Study 1, it is possible to extract the top 10 countries with confirmed cases and plot them on a ggplot.

# obtain top 10 country
d <- y['global'] #extract global data
d <- d[d$country != 'China',] #exclude China
n <- d %>% filter(time == time(y)) %>%
  top_n(10, cum_confirm) %>%
  arrange(desc(cum_confirm))

# plot top 10 on a graph since Feb 01 to most recent date in dataset
require(ggplot2)
require(ggrepel)
ggplot(filter(d, country %in% n$country, d$time > '2020-02-01'),
      aes(time, cum_confirm, color=country)) +
      geom_line() +
      geom_text_repel(aes(label=country),
      function(d) d[d$time == time(y),])

This results in a colorful, easy-to-see line graph like below.

This graph shows the top 10 countries with the most confirmed cases outside of China. The United States and India are the most infected countries and show exponential growth. Meanwhile, the 7 other countries under Brazil have flattened their curve and slowed down the growth with an effective containment strategy. Some other European countries such as France and Italy are also seeing hundreds and thousands of new cases.

A gauge plot - Incubation Time

Below is a gauge type plot on ggplot().


Source: Kanevsky, 2020

A bar chart - Spectrum of Illness Severity


Source: Kanevsky, 2020

A bar chart - Clinical Manifestations


Source: Kanevsky, 2020

A bar & line chart - Case Fatality Rate (CFR) by Age Groups


Source: Kanevsky, 2020

A bar chart with timeline - Period of Infectivity


Source: Kanevsky, 2020

Animated growth of confirmed cases

Rather than a static map, maps can also be visualized overtime in dynamic form through a Magick R Package.

Using the same variables set previously, it is possible to create a moving heat map.
The below moving heatmap is the development of COVID-19 from February 1st, 2020 to March 31st, 2020.
It is possible to see that the virus originated in China and spread across the world.

install.packages("magick")  

library(magick)
require(nCov2019)
x <- get_nCov2019()
y <- load_nCov2019()

y <- load_nCov2019()
d <- c(paste0("2020-02-", 1:29), paste0("2020-03-", 1:31))
img <- image_graph(1200, 700, res = 96)
out <- lapply(d, function(date){
  p <- plot(y, date=date,
            label=FALSE, continuous_scale = TRUE)
            print(p)
})
dev.off()
animation <- image_animate(img, fps=2)
print(animation)

List of References

Holtz, Y., 2020. Data Visualization With R And Ggplot2. [online] R-graph-gallery.com. Available at: https://www.r-graph-gallery.com/ggplot2-package.html.

Kanevsky, G., 2020. Facts About Coronavirus Disease 2019 (COVID-19) In 5 Charts Created With R And Ggplot2. [online] Novyden.blogspot.com. Available at: https://novyden.blogspot.com/2020/03/facts-about-coronavirus-disease-2019.html.

Kumar, P., 2020. Understanding Plot() Function In R. [online] JournalDev. Available at: https://www.journaldev.com/36083/plot-function-in-r.

Pedersen, T., 2020. Ggplot2 Package. [online] Rdocumentation.org. Available at: https://www.rdocumentation.org/packages/ggplot2/versions/3.3.2.

EDUCBA. 2020. Plot Function In R. [online] Available at: https://www.educba.com/plot-function-in-r/.

Qian, X., 2020. Visualize The Pandemic With R #COVID-19. [online] Medium. Available at: https://towardsdatascience.com/visualize-the-pandemic-with-r-covid-19-c3443de3b4e4.